Floresta Sintá(c)tica: Bigger, Thicker and Easier

نویسندگان

  • Cláudia Freitas
  • Paulo Rocha
  • Eckhard Bick
چکیده

In this paper, we describe the resumption of activities of Floresta Sintá(c)tica, a treebank for Portuguese. We present some underlying guidelines around the project and how they influence our linguistic choices. We then describe the new texts added to the treebank, proceed to mention the new syntactic information added to the old texts, and finally describe the new user-friendly search system and the plans for its expansion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Floresta Sintá(c)tica: A treebank for Portuguese

This paper reviews the first year of the creation of a publicly available treebank for Portuguese, Floresta Sintá(c)tica, a collaboration project between the VISL and the Computational Processing of Portuguese projects. After briefly describing the main goals and the organization of the project, the creation of the annotated objects is presented in detail: preparing the text to be annotated, ap...

متن کامل

Automatic Semantic-Role Annotation for Portuguese

The paper presents and evaluates a parsing system for the automatic annotation of Porguguese text with semantic role tags. All in all, 38 different categories, like agent, patient, location etc. are distinguished. The annotater uses a grammar of 500 hand-written Constraint Grammar rules and exploits syntactic dependency links as well as semantic prototype classes and syntactic function. The int...

متن کامل

Adaptation of Data and Models for Probabilistic Parsing of Portuguese

We present the first results for recovering word-word dependencies from a probabilistic parser for Portuguese trained on and evaluated against human annotated syntactic analyses. We use the Floresta Sintá(c)tica with the Bikel multi-lingual parsing engine and evaluate performance on both PARSEVAL and unlabeled dependencies. We explore several configurations, both in terms of parameterizing the ...

متن کامل

Constraint Grammar-based conversion of Dependency Treebanks

This paper presents a new method for the conversion of one style of dependency treebanks into another, using contextual, Constraint Grammar-based transformation rules for both structural changes (attachment) and changes in syntacticfunctional tags (edge labels). In particular, we address the conversion of traditional syntactic dependency annotation into the semantically motivated dependency ann...

متن کامل

Part-of-Speech Tagging of Portuguese Using Hidden Markov Models with Character Language Model Emissions

This paper presents a probabilistic approach for POS tagging that combines HMMs and character language models being applied to Portuguese texts. In this approach, the emission probabilities for each hidden state in a HMM are estimated by a proper character language model. The tagger built has been trained and tested on Bosque, a subset of Floresta Sintá(c)tica treebank, reaching 96.2% accuracy ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008